1 Problem: Sifting through the noise in Online Dating

Modern dating platforms have drastically increased the number of possibilities for modern singles. With this great selection comes, however, the paradox of choice which causes a plethora to quickly transform from an opportunity to a burden. Endless swiping through images and profiles to find the needles in the haystack is an exhausting process.

2 Solution: Image Query and Analysis Engine

Quantitative image analytics can be used to process millions of images in seconds to filter out the noise and keep only the desired results.

Montage

\[ \rightarrow \textrm{Find winking with black hair} \]

Winner

2.1 Real-time

More importantly than a single query, is the ability to perform queries on complex datasets in real-time and have the processing distributed over a number of machines.

Faces \(\rightarrow\) Hulls \(\rightarrow\) Hulls

2.2 How?

The first question is how the data can be processed. The basic work is done by a simple workflow on top of our Spark Image Layer. This abstracts away the complexities of cloud computing and distributed analysis. You focus only on the core task of image processing.

The true value of such a scalable system is not in the single analysis, but in the ability to analyze hundreds, thousands, and even millions of samples at the same time.

With cloud-integration and Big Data-based frameworks, even handling an entire city network with 100s of drones and cameras running continuously is an easy task without worrying about networks, topology, or fault-tolerance.

2.3 What?

The images come from one or more dating sites or apps in the form of a real-time stream.

  • Labels

The first step is to identify the background and the region for the face inside of the image.

  • Background

The second is to enhance selectively the features in the face itself so they can be more directly quantified and analyzed.

  • Edges

2.4 Machine Learning

The quantitatively meaningful data can then be used to train machine learning algorithms (decision trees to SVM) in order to learn from previous successes or failures.

Here we show a simple decision tree trained to identify good and bad on the basis of color, position, texture and shape.

Classification Tree (Whole)

Furthermore the ability to parallelize and scale means thousands to millions of images and profiles can be analyzed at the same time to learn even more about your preferences.

3 Technical Aspects

To find out more about the technical aspects of our solution, check out our presentation at the Spark Summit or watch the video.

Check out our other demos to see how 4Quant can help you

4 Acknowledgements

The images have been provided by Yale Face Database hosted by the Computer Vision Group of UCSD. Analysis powered by Spark Image Layer from 4Quant, Visualizations, Document Generation, and Maps provided by:

To cite ggplot2 in publications, please use:

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: elegant graphics for data analysis}, publisher = {Springer New York}, year = {2009}, isbn = {978-0-387-98140-6}, url = {http://had.co.nz/ggplot2/book}, }

To cite package ‘leaflet’ in publications use:

Joe Cheng and Yihui Xie (2014). leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library. R package version 0.0.11. https://github.com/rstudio/leaflet

A BibTeX entry for LaTeX users is

@Manual{, title = {leaflet: Create Interactive Web Maps with the JavaScript LeafLet Library}, author = {Joe Cheng and Yihui Xie}, year = {2014}, note = {R package version 0.0.11}, url = {https://github.com/rstudio/leaflet}, }

To cite plyr in publications use:

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.

A BibTeX entry for LaTeX users is

@Article{, title = {The Split-Apply-Combine Strategy for Data Analysis}, author = {Hadley Wickham}, journal = {Journal of Statistical Software}, year = {2011}, volume = {40}, number = {1}, pages = {1–29}, url = {http://www.jstatsoft.org/v40/i01/}, }

To cite the ‘knitr’ package in publications use:

Yihui Xie (2015). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.10.

Yihui Xie (2013) Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

To cite package ‘rmarkdown’ in publications use:

JJ Allaire, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham and Rob Hyndman (2015). rmarkdown: Dynamic Documents for R. R package version 0.7. http://CRAN.R-project.org/package=rmarkdown

A BibTeX entry for LaTeX users is

@Manual{, title = {rmarkdown: Dynamic Documents for R}, author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Rob Hyndman}, year = {2015}, note = {R package version 0.7}, url = {http://CRAN.R-project.org/package=rmarkdown}, }

To cite package ‘DiagrammeR’ in publications use:

Knut Sveidqvist, Mike Bostock, Chris Pettitt, Mike Daines, Andrei Kashcha and Richard Iannone (2015). DiagrammeR: Create Graph Diagrams and Flowcharts Using R. R package version 0.7.

A BibTeX entry for LaTeX users is

@Manual{, title = {DiagrammeR: Create Graph Diagrams and Flowcharts Using R}, author = {Knut Sveidqvist and Mike Bostock and Chris Pettitt and Mike Daines and Andrei Kashcha and Richard Iannone}, year = {2015}, note = {R package version 0.7}, }